Lab: CNN Visualization¶

Mark Brubaker & Elias Mann

In this lab we will be using diifferent techniques for visualiizing and analyzing the inner workings of a convolutional neural network. We will use class activatiion mapping, gradient ascent, circuits, maximal exitation image generation for multi channel filters, and individual channel filter visualization to better understand the VGG19 network.

In [ ]:
import numpy as np
import scipy as sp
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os

# silence tensorflow warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# import vgg19 pretrained model from keras
import tensorflow as tf
from tensorflow import keras
from keras.applications.vgg19 import VGG19, preprocess_input, decode_predictions
from keras import models
from keras import backend as K

# getting rid of the warning messages about optimizer graph
tf.get_logger().setLevel('ERROR')
tf.autograph.set_verbosity(3)
In [ ]:
# print Tensorflow and CUDA information
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("Num CPUs Available: ", len(tf.config.experimental.list_physical_devices('CPU')))
print(f"Tensorflow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
 
if tf.test.gpu_device_name():
    gpu_devices = tf.config.list_physical_devices('GPU')
    details = tf.config.experimental.get_device_details(gpu_devices[0])
    name = details.get('device_name', 'Unknown GPU')
    
    print(f"Using {name}")
else:
    print("No GPU found")
Num GPUs Available:  1
Num CPUs Available:  1
Tensorflow version: 2.11.0
Keras version: 2.11.0
Using NVIDIA GeForce RTX 3070 Laptop GPU

Choosing a Model¶

For this analysis we will be visualizing the VGG19 model. VGG19 is a convolutional neural network that is trained on the ImageNet dataset. We chose this model as it has been trained on a large dataset, and does not have residual connections only uses 19 layers and 3X3 convolutional filters. This makes the curcuits easier to create, and model easier to visualize and understand.

In [ ]:
# block3_conv2 filter 16 is good for pier.jpg
# block5_conv2 filter 16 is good for dog.jpg
layer_name = 'block3_conv2'
filter_num = 16

# load the model
model = VGG19(weights='imagenet', include_top=True)

# set VGG to be frozen
for layer in model.layers:
    layer.trainable = False

intermediate_layer_model = models.Model(inputs=model.input,
                                        outputs=model.get_layer(layer_name).output)

model.summary()
Model: "vgg19"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv4 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv4 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv4 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 fc1 (Dense)                 (None, 4096)              102764544 
                                                                 
 fc2 (Dense)                 (None, 4096)              16781312  
                                                                 
 predictions (Dense)         (None, 1000)              4097000   
                                                                 
=================================================================
Total params: 143,667,240
Trainable params: 0
Non-trainable params: 143,667,240
_________________________________________________________________

We start by classifying a few images to ensure that the model is working properly. As you can see the model is assigns plausable classifications to the images.

In [ ]:
path = 'images/'

# Set up the plot grid
fig, axs = plt.subplots(1, len(os.listdir(path)), figsize=(20, 5))

# Iterate through the images and plot them
for i, filename in enumerate(os.listdir(path)):
    img = Image.open(os.path.join(path, filename)).resize((224, 224))
    x = np.array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    preds = model.predict(x)
    axs[i].imshow(img)
    axs[i].set_title(f'Predicted: {decode_predictions(preds, top=1)[0][0][1]} \n Actual: {filename.split(".")[0]}')
    axs[i].axis('off')

plt.show()
1/1 [==============================] - 7s 7s/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 26ms/step

Selecting A Filter¶

VGG19 has 19 layers, and we will be analyzing a multi-channel filter from the the 6th called 'block3_conv2' layer since it is an early mid-level layer - it has several convolutional layers before and after it.

In [ ]:
#the next two functions are from https://github.com/8000net/LectureNotesMaster/blob/master/04%20LectureVisualizingConvnets.ipynb Dr. Eric Larson's lecture notes

# this funtion scales the image to be between 0 and 255, there are the sizes we will use for display and also the sizes used for VGG19
def prepare_image_for_display(img, norm_type='max'):
    if norm_type == 'max':
        # min/max scaling, best for regular images
        new_img = (img - img.min()) / (img.max()-img.min())
    else:
        # std scaling, best when we are unsure about large outliers
        new_img = ((img - img.mean()) / (img.std() +1e-3))*0.15 + 0.5
        
    new_img *= 255
    new_img = np.clip(new_img, 0, 255)
    
    if len(new_img.shape)>3:
        new_img = np.squeeze(new_img)

        
    return new_img.astype('uint8')


# this function generates the pattern for a given layer and filter using gradient ascent
def generate_pattern(layer_name, filter_index, size=224):
    # Build a model that outputs the activation
    # of the nth filter of the layer considered.
    layer_output = model.get_layer(layer_name).output

    # Isolate the output 
    new_model = models.Model(inputs=model.input, outputs=layer_output)
    
    # We start from a gray image with some uniform noise
    input_img_data = np.random.random((1, size, size, 3)) * 20 + 128.
    
    # Create a placeholder for the input image
    I_start = tf.Variable(input_img_data, name='image_var', dtype = 'float64')

    # Preprocess the input image using the same preprocessing as the original images in the dataset
    I = preprocess_input(I_start) # only process once
    # Run gradient ascent for 40 steps
    eta = 1.
    for i in range(40):
        with tf.GradientTape(watch_accessed_variables=False) as tape:
            # Watch the input image to compute its gradient with respect to the loss
            tape.watch(I)
            # get variable to maximize 
            model_vals = new_model(I) 

            # Compute the mean activation of the specified filter
            loss = tf.reduce_mean(model_vals[:, :, :, filter_index])

        # Compute the gradient of the input picture w.r.t. this loss
        # add this operation input to maximize
        grad_fn = tape.gradient(loss, I)
        # Normalization trick: we normalize the gradient
        grad_fn /= (tf.sqrt(tf.reduce_mean(tf.square(grad_fn))) + 1e-5) # mean L2 norm
        I += grad_fn * eta # one iteration of maximizing
        
    # return the numpy matrix so we can visualize 
    img = I.numpy()
    return prepare_image_for_display(img, norm_type='std')

# prepare the images for VGG19
def prepare_image_for_vgg(img):
    # convert to numpy array
    x = np.array(img)

    
    if x.ndim == 2:  # grayscale image
        x = np.stack([x, x, x], axis=-1)
    elif x.shape[2] == 4:  # RGBA image
        x = Image.fromarray(x).convert('RGB')
        x = np.array(x)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    return x

def plot_heatmap_on_image(image, heatmap, ax, cmap='viridis', alpha=0.5):
    ax.imshow(image)
    im = ax.imshow(heatmap, cmap=cmap, alpha=alpha, extent=(0, 224, 224, 0))
    plt.colorbar(im, ax=ax, shrink=0.7)

We visualize the filter using gradient ascent. By observng the visualization of this filter, we hypothesize that it detects horizonaltal line. We will test this by using Class Activcation Mapping (CAM) to see if the filter is reactive to horizontal lines on the image of a pier. We expect the filter to react strongly to the horizonal lines on the pier, and weakly to the vertical lines on the pier.

In [ ]:
def plot_filter_image_and_activations(layer_name, filter_num, img_path):

    # Load the image
    img = Image.open(img_path).resize((224, 224))
    img_array = prepare_image_for_vgg(img)

    # Get the predictions
    preds = model.predict(img_array, verbose=0)
    label = decode_predictions(preds, top=1)[0][0][1]

    # Get the activations for the filter
    activation = intermediate_layer_model.predict(img_array, verbose=0)
    heatmap = activation[0, :, :, filter_num]
    heatmap = sp.ndimage.zoom(heatmap, (224/heatmap.shape[0], 224/heatmap.shape[1]), order=1)

    fig, axes = plt.subplots(1, 3, figsize=(20, 10))

    # Display the filter
    axes[0].imshow(generate_pattern(layer_name, filter_num))
    axes[0].set_title(f'{layer_name} filter {filter_num}')
    axes[0].axis('off')

    # Display the image
    axes[1].imshow(img)
    axes[1].set_title(f'Image: {os.path.basename(img_path)}\nPredicted class: {label}')
    axes[1].axis('off')

    # Display the activations as their own image
    axes[2].imshow(heatmap)
    axes[2].set_title(f'Activations for filter {filter_num}')
    axes[2].axis('off')

    plt.show()

img_path = 'pier.jpg'
# img_path = 'dog.jpg'
plot_filter_image_and_activations(layer_name, filter_num, img_path)

The filter reacts as we expected. There are strong activations over parts of the image that have long horizontal lines.

In [ ]:
# load the ImageNet sample images from /ImageNet_Samples
# each image is class_name.JPEG

# load the images
path = 'ImageNet_Samples/'
images = []

for filename in os.listdir(path):
    img = Image.open(os.path.join(path, filename)).resize((224, 224))
    image_arr = prepare_image_for_vgg(img)
    if image_arr.shape != (1, 224, 224, 3):
        print(f"Skipping {filename} due to incorrect shape: {image_arr.shape}")
        continue
    images.append(image_arr)

# predict for each image and record the activations of block4_conv4 filter {filter_num}
# get the 5 images that excite that filter the most

activations = []
for img_arr in images:
    activation = intermediate_layer_model.predict(img_arr, verbose=0)
    activations.append(activation)

Next we run a dataset through VGG19 which contains one example of each class in the ImageNet dataset. This will help us get a diverse picture of what the filter is looking for and what type of images it is most excited by.

In [ ]:
def l2_norm(activation):
    """
    Computes the L2 norm of an activation.
    :param activation: The 3D numpy array representing the activation
    :return: The L2 norm value
    """
    return np.sqrt(np.sum(np.square(activation)))

# Find the L2 norm of the activations for the specified filter
l2_norm_activations = [l2_norm(activation[:, :, :, filter_num]) for activation in activations]

# Get the indices of the top 5 images that excite the filter the most
top_5_indices = np.argsort(l2_norm_activations)[-5:][::-1]

# Plot the top 5 images with heatmaps using the L2 norm activation
fig, axes = plt.subplots(1, 5, figsize=(20, 5))
for i, index in enumerate(top_5_indices):
    img = Image.open(os.path.join(path, os.listdir(path)[index])).resize((224, 224))
    img_array = images[index]
    preds = model.predict(img_array, verbose=0)
    label = decode_predictions(preds, top=1)[0][0][1]
    activation = activations[index]
    heatmap = activation[0, :, :, filter_num]
    heatmap = sp.ndimage.zoom(heatmap, (224/heatmap.shape[0], 224/heatmap.shape[1]), order=1)

    plot_heatmap_on_image(img, heatmap, axes[i], cmap='coolwarm', alpha=0.5)
    axes[i].set_title(f'Predicted class: {label}\nL2 norm activation: {l2_norm_activations[index]:.4f}')
    axes[i].axis('off')

plt.show()

These images which strongly excite the filter are all of objects that have horizontal lines in them. This is a good sign that the filter is looking for horizontal lines.

Analyzing Each Channel of the Filter¶

In [ ]:
def get_filter_coefficients(layer_name, filter_num):
    # Get the layer object
    layer = model.get_layer(layer_name)

    # Get the layer's weights
    weights = layer.get_weights()[0]

    # Get the filter coefficients for the specified filter number
    filter_coefficients = weights[:, :, :, filter_num]

    return filter_coefficients

def get_top_six_strongest_filters(layer_name, filter_num):
    filter_coefficients = get_filter_coefficients(layer_name, filter_num)
    
    # Calculate the L2 norm for each input filter
    l2_norms = [l2_norm(filter_coefficients[:, :, i]) for i in range(filter_coefficients.shape[2])]
    
    # Get the indices of the top six strongest input filters
    top_six_indices = np.argsort(l2_norms)[-6:][::-1]
    
    return filter_coefficients[:, :, top_six_indices], top_six_indices

def categorize_filter(filter_coefficients):
    if np.mean(filter_coefficients) > 0:
        return "mostly excitatory"
    else:
        return "mostly inhibitory"

top_six_filters, top_six_filter_indices = get_top_six_strongest_filters(layer_name, filter_num)


fig, axes = plt.subplots(1, 6, figsize=(20, 5))

for i in range(6):
    filter_coefficients = top_six_filters[:, :, i]
    category = categorize_filter(filter_coefficients)
    
    # Normalize the filter coefficients for visualization
    normalized_filter = (filter_coefficients - filter_coefficients.min()) / (filter_coefficients.max() - filter_coefficients.min())
    
    axes[i].imshow(normalized_filter, cmap='gray')

    # annotate the filter heatmap
    for j in range(3):
        for k in range(3):
            text = axes[i].text(k, j, f'{normalized_filter[j, k]:.2f}',
                               ha="center", va="center", color="r")


    axes[i].set_title(f'Input filter {top_six_filter_indices[i] + 1}\n{category}')
    axes[i].axis('off')

plt.show()

In the code above we visualize the top 6 3x3 convoliutional filter channels from the 'block3_conv2' layer. We do so by first gettting the filter coeffecients for each channel in the model. We then find the top 6 strongest channel filters using l2 norm. Finally we visualise each channel and display its coeffeicents. The darker pixels represent smaller (more negative) weights, while the lighter pixels represent larger (more positive) weights. We categorize each of the top six input filters as "mostly inhibitory" if it has a negative impact on the activations of the filter, and "mostly excitatory" if it has a positive impact on the activations of the filter. We can see that the top 6 filters are mostly excitatory, and that the most of the channels resemble horizontal edge detectors. We think the difference between 218 & 167 as opposed to 20 & 251 is the strength of the edge detecion. Having strong edges is mostly excitatory, while if theoes edges are not well defined it would inhibit the filter. This likely allows our filter to only detect the stringest of edges.

Visualiizing What Image Maximally Excites Each Channel Filter¶

In [ ]:
def generate_pattern_channel(layer_name, channel, size=224):
    layer_output = model.get_layer(layer_name).output
    new_model = models.Model(inputs=model.input, outputs=layer_output)
    input_img_data = np.random.random((1, size, size, 3)) * 20 + 128.

    I_start = tf.Variable(input_img_data, name='image_var', dtype='float64')
    I = preprocess_input(I_start)

    eta = 1.
    for i in range(40):
        with tf.GradientTape(watch_accessed_variables=False) as tape:
            tape.watch(I)
            model_vals = new_model(I)
            loss = tf.reduce_mean(model_vals[:, :, :, channel])

        grad_fn = tape.gradient(loss, I)
        grad_fn /= (tf.sqrt(tf.reduce_mean(tf.square(grad_fn))) + 1e-5)
        I += grad_fn * eta

    img = I.numpy()
    return prepare_image_for_display(img, norm_type='std')

# layer before
prev_layer = 'block3_conv1'

# uses the same filters as selected int the last section


# plot each filter next to the corresponding channel
fig, axes = plt.subplots(2, 6, figsize=(20, 10))

for i, channel in enumerate(top_six_filter_indices):
    # Display the filter
    filter_coefficients = top_six_filters[:, :, i]
    normalized_filter = (filter_coefficients - filter_coefficients.min()) / (filter_coefficients.max() - filter_coefficients.min())

    # annotate the filter heatmap
    for j in range(3):
        for k in range(3):
            text = axes[0, i].text(k, j, f'{normalized_filter[j, k]:.2f}',
                               ha="center", va="center", color="r")

    axes[0, i].imshow(normalized_filter, cmap='gray')
    axes[0, i].set_title(f'Filter {top_six_filter_indices[i] + 1}')
    axes[0, i].axis('off')
    
    # Display the pattern that maximally excites the filter
    img = generate_pattern_channel(prev_layer, channel)
    axes[1, i].imshow(img)
    axes[1, i].set_title(f'Maximal excitation {top_six_filter_indices[i] + 1}\n{category}')
    axes[1, i].axis('off')

plt.tight_layout()
plt.show()

In the code above, we use gradient ascent to find the image that maximally excites each channel filter. Each of the channel filters images make sense for our multi-channel filter. As we know, our selected multi-channel filter is sensitive to horizontal lines, and we can observe that many of the maximally excited images seem to correspond to images with horizontal lines. The second channel filter that we visualized shows vertical lines that are more spaced out. This is reasonable considering that our filter also has a light overlay of these vertical stripes.. The fourth channel filter shows a distinction between the middle and right and left sides of the image, suggesting sensitivity to differences in the center and periphery of the scene. The other channel filters indicate activations for mostly horizontal and some diagonal lines, which further supports the idea that our selected multi-channel filter is mainly responsive to horizontal line features. Combined out selected filter seems to be most senstive to high-frequency horizontal lines. Given the nature of these preceding filters to detect light to dark edges as well as dark to light edges it better allows our fitler to be orientation invariant in the vertical direction. Although there is some sensitivity to vertical lines, it is not near as strong as the sensitivity to horizontal lines and not strong enough for this filter to be considered polysemantic. It makes sense how these filters form a circuit with our selected filter as together they form a path that reacts strongly to horizontal lines in images.